180
■Bio-mathematics, Statistics and Nano-Technologies: Mosquito Control Strategies
Table 9.2: The architecture of ANN models aimed for prediction of Rindex based on physic-
ochemical properties and some statistical parameters of the networks.
Network
Rtrain
Rtest
Rvalid
RMSEtrain
RMSEtest
RMSEvalid
Training
Hidden
Output
Architecture
Algorithm
Activation
Activation
Function
Function
MLP 3-6-1
0.9183
0.9116
0.9998
100.7
12.8
38.6
BFGS 24∗
Tanh
Exponential
MLP 3-8-1
0.9396
0.8294
0.9989
75.6
10.4
67.3
BFGS 24∗
Logistic
Logistic
MLP 3-6-1
0.9103
0.6379
0.9990
117.5
31.3
104.7
BFGS 10∗
Exponential
Logistic
*the number of training cycles after which the best network architecture is reached
et al. 2008. The obtained high-quality model is aimed for prediction of repellent activity of
novel compounds structurally similar to the compounds used in the ANN modeling.
9.3.4
Mathematical validation of QSAR models
A formed mathematical model is not applicable and cannot be considered reliable if
it is not statistically validated by using a proper validation approach (Gramatica and San-
gion, 2016; Chirico and Gramatica, 2012; Chirico and Gramatica, 2011). Some of the
standard validation parameters are Pearson correlation coefficient (R), determination co-
efficient (R2), adjusted determination coefficient (R2
adj), Fisher test (F-value), root mean
square error (RMSE) and probability (p-value). As a part of internal validation of the mod-
els, cross-validation (also known as out-of-sample testing) is often applied. This heuristic
validation method is based on the omitting one or more objects from the set and the mod-
eling is than based on the remaining compounds in the training set and the activity of
the removed compounds is then estimated based on the newly established QSAR model
(Gramatica and Sangion 2016; Chirico and Gramatica 2012; Chirico and Gramatica 2011).
These cycles are repeated for all the compounds from the training set. Eventually, the pre-
dictivity of the QSAR model is judged based on the following parameters: cross-validation
determination coefficient (R2cv), total sum of squares (TSS), predicted residual error sum
of squares (PRESS), PRESS/TSS ratio and predicted standard deviation (SDPRESS).
One of the most reliable validation approaches of the QSAR models is the external
validation when the external test set is kept out of the training set and then after the mod-
eling is used for the testing of real predictivity of the QSAR model. The parameters of
the external validation are determination coefficient of the prediction of the external set
(R2ext), mathematically different determination coefficient for the external validation in-
cluding Q2F1, Q2F2, Q2F3 and r2m, as well as concordance correlation coefficient (CCC).
Detailed explanation of the validation of QSAR models can be found elsewhere (Gramat-
ica and Sangion 2016; Chirico and Gramatica 2012; Chirico and Gramatica 2011).
The statistical parameters of the established linear models from the subsection 3.1.1.
are presented in Table 2. The results indicate that the model MLR2 has the highest pre-
dictivity according to the highest R2cv coefficient and the lowest error (the lowest RMSE
parameter), while the ULR model makes the biggest error in prediction and may be used
for approximate estimation of Rindex of the compounds structurally similar to the com-
pounds used in model’s calibration. According to the values of R2adj it can be concluded